Search | WHO COVID-19 Research Database

Web Page Blocker using Machine Learning Classifier and Chrome Plugin

Gorro, K.; Feliscuzo, L.; Romana, C. L. S..

7th IEEE International Conference on Information Technology and Digital Applications, ICITDA 2022 ; 2022.

Article in English | Scopus | ID: covidwho-2191874

ABSTRACT

In 2020, Most Filipinos are using the internet due to COVID-19 pandemic lockdowns. The internet is not limited to adults and children might be exposed to online adult content and abuse. The Philippine Internet Service providers fail to capture pornographic web pages that are not for child viewing. A Web Page classifier would help in detecting and classifying web pages. In this study, a total of 12000 web pages with adult content and academic web pages were collected using scrapy and existing datasets from DMOZ were used to create a Support Vector Machine (SVM) multi-class classifier. To improve the accuracy of the SVM model, data preprocessing was performed to remove noisy and irrelevant data from the dataset. The text in the web pages was used to train the SVM classifier by using Term Frequency and Inverse Document Frequency, Count vectorizer, and Word2vec Skip-gram embedding with TF-IDF as a multiplier. A series of experiments were conducted using multiple word embedding techniques. The SVM model built using word2vec with TF-IDF multiplier outperforms the SVM model built using TF-IDF and Count Vectorizer. The word embedding generated using word2vec was generated with a window size of 9 and a vector dimension of 900. The SVM model built using word2vec shows an S6% accuracy. The SMV model is deployed in the Django framework and a chrome plugin was created to use the SVM model using REST API. © 2022 IEEE.

Exploring Natural Language Processing Techniques in Social Media Analysis during a Pandemic: Understanding a corpus of Facebook posts using Word2vec and LDA

Gorro, K. D.; Ali, M. F.; Gorro, K. D.; Ancheta, J. R..

ACM Int. Conf. Proc. Ser. ; PartF168341:69-73, 2020.

Article in English | Scopus | ID: covidwho-1197281

ABSTRACT

People around the world have used social media extensively to communicate and express opinions especially during this time of the rapid spread of COVID-19. Nowadays, the various narratives of social media users are important that can be used in creating measures to curb the deadly disease. However, the manual collection of data from social media such as Facebook and its analysis can take time. Thus, this study attempted to use natural language processing (NLP) techniques such as topic modeling and word embedding to identify the concepts contained in the posts and comments of Facebook users in the Philippines regarding the pandemic. This study harvested posts and comments in Facebook groups that are primarily Filipino citizens that express opinions and suggestions in COVID-19 responses. Using Latent Dirichlet Allocation (LDA), this study was able to generate 10 topics related to the concepts of (1) self-discipline, (2) prayers for the frontliners, (3) total lockdown, (4) following government guidelines and protocols, and (5) flattening the curve of the disease. Meanwhile, word groups generated by Word2vec developed concepts such as (1) mass testing, (2) hope for faster recovery, and (3) expectation from the government. The average cosine similarity for word groups is 0.92, which implies strong relatedness of each word per group. This study proved that the use of NLP techniques helped in analyzing the themes of Facebook posts and comments related to the pandemic. © 2020 ACM.

ABSTRACT

ABSTRACT

SEND TO:

SELECTION OF CITATIONS

SEARCH DETAIL